Some personal and computational notes loosely based on the analytical work I have done in response to COVID-19 pandemic in Scotland. In my GIS Analyst role I use wide range of proprietary and open source GIS software as well as tend to blend R and Python languages. Here is a flavor of my work in open and reproducible notebook, following a Donald Knuth’s “literate programming” paradigm.
First,all packages necessary for the analysis.
library(dplyr)
library(tidyr)
library(lubridate)
library(readr)
library(ggplot2)
library(rvest)
library(sf)
library(stringr)
library(tmap)
library(kableExtra)
library(gghighlight)
library(hrbrthemes)
library(gganimate) # animate plots
library(geofacet)
library(zoo) # moving avaragesNow, lets read the data from the Data Science Scotland repository
cases_scot = read_csv("https://raw.githubusercontent.com/DataScienceScotland/COVID-19-Management-Information/master/COVID19%20-%20Daily%20Management%20Information%20-%20Scotland%20-%20Testing.csv") %>%
select(Date,'Testing - Cumulative people tested for COVID-19 - Positive') %>%
rename(total = 'Testing - Cumulative people tested for COVID-19 - Positive') %>%
mutate(date = as.Date(Date, "%d-%b-%Y"))
# deaths
deaths_scot = read_csv("https://raw.githubusercontent.com/DataScienceScotland/COVID-19-Management-Information/master/COVID19%20-%20Daily%20Management%20Information%20-%20Scotland%20-%20Deaths.csv") %>%
rename(total = 'Number of COVID-19 confirmed deaths registered to date') %>%
mutate(date = as.Date(Date, "%d-%b-%Y"))Since we only have on variable with total cases and deaths, we are going to calculate additional attributes: daily new cases, percentage change and seven days rolling average.
#cases
cases_scot = cases_scot %>%
mutate(new_cases = (total - lag(total)), # new daily cases
pct_change = new_cases / lag(total) * 100, # percentage change
cases_07da = rollmean(new_cases, k = 7, fill = NA)) # 7 days rolling average
# deaths
deaths_scot = deaths_scot %>%
mutate(new_deaths = (total - lag(total)), # new daily cases
pct_change = new_deaths / lag(total) * 100, # percentage change
deaths_07da = rollmean(new_deaths, k = 7, fill = NA)) # 7 days rolling averageWe are ready to plot our variables using ggplot package
ggplot()+
geom_line(data = cases_scot, aes(x=Date, y=new_cases), color='gray') +
geom_line(data = cases_scot, aes(x=Date, y=cases_07da), color='#ba0000') +
labs( title = "Scotland's 7 days rolling average COVID-19 cases",
subtitle = "Between March 2020 and March 2021",
caption="Source:https://github.com/DataScienceScotland/COVID-19-Management-Information ",
y = "Cases",
x = "Month") +
scale_x_date(date_breaks = '1 month', date_labels = '%m %y') +
scale_y_log10() +
hrbrthemes::theme_ipsum()We can also create a map showing latest cases at NHS Health Board level. But first we need to get the data for each geograph unit.
# load data
cases_by_hb = read_csv("https://raw.githubusercontent.com/DataScienceScotland/COVID-19-Management-Information/master/COVID19%20-%20Daily%20Management%20Information%20-%20Scottish%20Health%20Boards%20-%20Cumulative%20cases.csv")
# pivot data to long format nd select latest date
last_cases_by_hb_long = cases_by_hb %>%
pivot_longer(!Date,names_to = "hb_name", values_to = "cases") %>%
mutate(cases = as.numeric(cases)) %>%
filter(Date == max(Date))
# load mid 2019 population estimatie
pop_hb = read_csv("data/hb_pop_2019.csv")
# first join: cases with population, also calculate rate per 100k
cases_by_hb_pop = left_join(last_cases_by_hb_long, pop_hb) %>%
mutate(
rate_100k = cases / hb_pop * 100000
)
# load Health Board geographies
health_boards = st_read("data/SG_NHS_HealthBoards_2019.shp", quiet = TRUE)
# second join: health board boundaries
cases_hb_rate100k = left_join(health_boards,cases_by_hb_pop, by = c("HBCode" = "hb_code"))Finally, we are ready to produce the map using tmap package. Note that we are saving the output to png file.
map = tm_shape(cases_hb_rate100k) +
tm_polygons(col = "rate_100k",
title = "Data from 17th March 2021 \nNumber of confirmed cases per 100,000 people",
lwd = 1,
border.col = "white",
style = "equal",
n = 5,
palette="cividis" # choose between reds or cividis
# legend.format=list(fun=function(x) paste0(formatC(x, digits=0, format="f")))
) +
tm_layout(
# fontfamily = "arial",
title = "Total COVID - 19 cases \nper 100,000 people, by NHS Health Board",
title.size = 0.9,
legend.title.size = 1,
attr.outside= TRUE,
attr.outside.position = "bottom",
attr.outside.size = .1,
attr.just = c("left", "top"),
inner.margins = 0.02,
outer.margins = c(0,0.01,0.01,0.01),
asp = 4/6,
frame = F,
design.mode = F
) +
tm_credits("Cases data from: https://www.gov.scot/coronavirus-covid-19\nPopulation data: NRS Population Estimates, mid-2018\nBoundaries: SpatialData.gov.scot",
col = "grey50",
position=c("left","top"),
align = "left",
size = 0.325)
# save graphic to png file
tmap_save(
tm = map,
filename = "hb_map_cases_rate.png",
height = 6,
width = 4,
units = "in",
asp = 4/6
)## Map saved to /Users/michal/Documents/covid/hb_map_cases_rate.png
## Resolution: 1200 by 1800 pixels
## Size: 4 by 6 inches (300 dpi)
Ok, map is ready !
I can’t recommend tmap package enough.
sessionInfo()## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] zoo_1.8-9 geofacet_0.1.10 gganimate_1.0.5 hrbrthemes_0.8.0
## [5] gghighlight_0.3.0 kableExtra_1.3.4 tmap_3.0 stringr_1.4.0.9000
## [9] sf_0.9-6 rvest_0.3.5 xml2_1.3.2 ggplot2_3.3.2
## [13] readr_1.4.0 lubridate_1.7.9.2 tidyr_1.1.2 dplyr_1.0.2
##
## loaded via a namespace (and not attached):
## [1] webshot_0.5.2 RColorBrewer_1.1-2 progress_1.2.2
## [4] httr_1.4.2 tools_4.0.2 R6_2.5.0
## [7] KernSmooth_2.23-17 rgeos_0.5-2 DBI_1.1.0
## [10] colorspace_1.4-1 raster_3.1-5 withr_2.3.0
## [13] sp_1.4-5 tidyselect_1.1.0 gridExtra_2.3
## [16] prettyunits_1.1.1 leaflet_2.0.3 curl_4.3
## [19] compiler_4.0.2 extrafontdb_1.0 cli_2.3.1
## [22] leafem_0.1.1 scales_1.1.1 classInt_0.4-3
## [25] systemfonts_0.2.1 digest_0.6.27 rmarkdown_2.6.4
## [28] svglite_1.2.3 base64enc_0.1-3 dichromat_2.0-0
## [31] jpeg_0.1-8.1 pkgconfig_2.0.3 htmltools_0.5.1.1
## [34] extrafont_0.17 highr_0.8 htmlwidgets_1.5.1
## [37] rlang_0.4.10 rstudioapi_0.13 generics_0.1.0
## [40] farver_2.0.3 crosstalk_1.1.0.1 magrittr_2.0.1
## [43] Rcpp_1.0.6 munsell_0.5.0 abind_1.4-5
## [46] gdtools_0.2.2 lifecycle_0.2.0 stringi_1.5.3
## [49] leafsync_0.1.0 yaml_2.2.1 tmaptools_3.0
## [52] geogrid_0.1.1 grid_4.0.2 parallel_4.0.2
## [55] ggrepel_0.8.2 crayon_1.4.1 lattice_0.20-41
## [58] stars_0.4-1 hms_0.5.3 knitr_1.31
## [61] pillar_1.4.7 codetools_0.2-16 imguR_1.0.3
## [64] XML_3.99-0.3 glue_1.4.2 evaluate_0.14
## [67] gifski_0.8.6 png_0.1-7 vctrs_0.3.6
## [70] tweenr_1.0.1 Rttf2pt1_1.3.8 gtable_0.3.0
## [73] purrr_0.3.4 assertthat_0.2.1 xfun_0.22
## [76] lwgeom_0.2-3 e1071_1.7-4 rnaturalearth_0.1.0
## [79] class_7.3-17 viridisLite_0.3.0 tibble_3.0.4
## [82] units_0.6-7 ellipsis_0.3.1
A work by Michal Michalski